Dependence of Protein Structure on Environment: FOD Model Applied to Membrane Proteins

The natural environment of proteins is the polar aquatic environment and the hydrophobic (amphipathic) environment of the membrane. The fuzzy oil drop model (FOD) used to characterize water-soluble proteins, as well as its modified version FOD-M, enables a mathematical description of the presence and influence of diverse environments on protein structure. The present work characterized the structures of membrane proteins, including those that act as channels, and a water-soluble protein for contrast. The purpose of the analysis was to verify the possibility that an external force field can be used in the simulation of the protein-folding process, taking into account the diverse nature of the environment that guarantees a structure showing biological activity.


Introduction
The aquatic environment is a natural environment for proteins and one that conditions biological activity. The vast majority of proteins are water-soluble proteins. Another equally important environment is that of the cell membrane, the characteristics of which (high hydrophobicity) are radically different to those of polar water.
The mechanisms that maintain a cell's homeostasis in an environment with variable characteristics are critical to the lives of bacteria. The main task is osmoregulation, i.e., the maintenance of a constant level of osmotic pressure. The main goal is to stabilize the electrolyte concentration, leading to fluid balance. An increased transport of water towards environments with higher osmotic pressure can be observed. Such increased transport can result in cell rupture. This process is controlled by proteins called Conductance Mechanosensitive Ion Channels (Msc) [1][2][3][4][5]. Due to their construction, a distinction is made between small conductance mechanosensitive channels (MscS) and large ones (MscL) [6][7][8].
The present work focused on membrane proteins from the group of small conductance mechanosensitive channels (MscS). These proteins enable bacteria to survive under the threat of an osmotic downshock. The presence of the given proteins enables cellular contents to be expelled by opening the pore, preventing cellular rupture. Two molecules, the structures of which were stabilized with a detergent, were analyzed. The aim of the study was to verify the application of the fuzzy oil drop model to assess the structures of individual domains of the extensive structure of the MscS protein. 2 of 17 To assess the structures of the membrane proteins, a modified version of the FOD-M model was used, taking into account the presence of an environment not limited only to the aquatic environment [9]. The FOD model assumes that a polypeptide chain composed of bipolar molecules aims to create a micelle-like structure, concentrating hydrophobic residues in the center with simultaneous exposure of polar residues on the surface [10].
However, obtaining an ideal distribution adapted to the polar environment of water is limited by the lack of freedom of movement of individual amino acids (covalent peptide bonds). Therefore, alignment according to the spherical micelle model is achieved by proteins to different extents. Local deviations from the idealized distribution have been shown to be related to biological activity; a local excess of hydrophobicity is used to build multichain complexes [11], and a local deficit for the selective binding of ligands or substrate [12].
The quantitative measurement of these deviations is determined by using the 3D Gaussian function, which, spread over the protein body, represents an idealized distribution T. The O distribution observed in the protein, resulting from hydrophobic interactions between amino acids, can be compared with the idealized distribution to reveal the mentioned differences.
The 3D Gaussian (3DG) function represents the maximum concentration of hydrophobicity in the center of the molecule, with values close to zero on the surface (within 3Sigma from the center). Such conditions favor the solubility of the protein [10].
The different environment of the membrane requires the opposite situation in the form of exposure of hydrophobic residues on the surface (contact with the membrane) and, in the case of the channel, free space and the presence of polar residues on the surface of the channel in the central part of the molecule. Therefore, the FOD-M model uses a function of the form 1-3DG to describe such a hydrophobicity distribution. The comparison of the distribution observed in the protein (hydrophobic inter-amino-acid interactions) in the membrane protein with the above-mentioned function can be expressed quantitatively by assessing the status of a given molecule [13].
In membrane proteins, in addition to the domain fully anchored to the membrane, there are outwardly exposed domains (assuming an aqueous environment) that do not exhibit the characteristics of a membrane domain. Therefore, the analysis of such a molecule enables the application of an appropriate model and its verification.
The description of the calculation procedure is given in the Materials and Methods. The work presents a tool for the assessment of the structure of proteins, including membrane proteins in particular. The method may prove useful in research on the properties of membrane proteins, the analysis of which is difficult due to their insolubility.

Experimental Section
The proteins that were the subjects of the current analysis (Table 1) were selected to represent the structures of membrane proteins serving as channels. The selection of proteins was designated by their different structural and functional forms in order to reveal differences in the interpretation of the obtained results based on the fuzzy oil drop model. Despite performing a similar function, the channels for the transport of various compounds take different structural forms. The list also includes a protein showing the typical system characteristic of water-soluble proteins, with a high match of the hydrophobicity distribution (O) to the assumed distribution (T) in the FOD model in accordance with 3DG. The presence of this protein was intended to provide an example enabling comparative analysis for the discussed membrane proteins.
The inclusion of MsbA proteins, the structure of which was obtained using peptidiscs, was intended to verify their role as a factor enabling the solubility of membrane proteins. For the purposes of the present analysis, they were used as an example to validate the applicability of the FOD-M model to a wide range of membrane-anchored protein (or membrane-like environment) structures. The BRSCT protein is a representative of soluble proteins, and therefore opposed to membrane proteins. Its presence was intended to enable comparative analysis for the application of the FOD and FOD-M models.

Description of the FOD Model and Its Modifications FOD-M
This model has been described many times [10]. A brief description is provided here to assist with the interpretation of the results.
The FOD model is an extended version of the oil drop model introduced by Kauzmann [17]. He assumed that the protein is composed of two layers: the outer polar layer and the inner hydrophobic layer. This discrete model was extended to a continuous form by introducing a 3D Gaussian function into the description of the hydrophobicity distribution. This function reflects the high concentration of hydrophobicity in the center of the molecule and the zero level of hydrophobicity at the surface. The values of hydrophobicity decrease from the maximum in the center as we move away from the center. The 3D Gaussian function spread over the body of the protein allows determination of the idealized level of hydrophobicity in the positions of effective atoms (averaged positions of atoms making up the amino acid). This distribution, referred to as T, compared with the actual level resulting from inter-amino-acid interactions, referred to as O, reveals the degree of similarity and allows for the identification of locations with different characteristics [18]. The O distribution expresses dependence on the distance between amino acids and their intrinsic hydrophobicity. Any scale can be applied [10]. After the T and O distributions are normalized, it is possible to quantify their degree of similarity. This assessment is performed by using the D KL divergence entropy [19] and introducing a second reference distribution R, where each residue is assigned a hydrophobicity level of 1/n, where n is the number of amino acids in the chain. Such a distribution, contrary to the distribution described by the 3DG function, does not differentiate the levels of hydrophobicity in any way. Therefore, the protein is described with two D KL values: one for the O|T relation and one for the O|R relation. Comparing these values allows evaluation of the distances between distributions. A protein for which D KL (O|T) < D KL (O|R) is interpreted as having a hydrophobic concentration in the form of a hydrophobic core. To avoid using two values, the parameter RD (Relative Distance), defined as follows, was introduced: An RD value > 0.5 indicates the presence of generally micelle-like deformation. Identification of positions or segments on the T and O profiles with divergent courses allows the identification of amino acids and their roles, i.e., the biological function of a given protein.
The hydrophobic environment of the membrane requires a completely different arrangement of the hydrophobicity in a protein in order to be able to permanently interact with the surrounding membrane. Here, the protein should exhibit hydrophobicity on the surface, and, in the case of a channel, polarity in the center (or, due to the free space of the channel, low hydrophobicity). Therefore, to describe the hydrophobicity distribution in a protein domain anchored in the membrane, a function was proposed in the FOD-M model that complements the 3D Gauss function. For calculation purposes, the "inverse" distribution is defined as follows: where T MAX is the maximum value in the T distribution This distribution Mi should be normalized: where the index n denotes the normalization of the distribution. In fact, the distribution of the membrane-anchored domain is not a simple Mi distribution, but a combination of the T distribution and the M distribution.
Finally, taking into account the possibility of a variable proportion between the distribution T and M, the outer field is defined as follows: where K is the parameter defining the contribution of the factor expressing the "modification" of the field based on the micelle-like distribution.
All the examples of proteins described later in the results are expressed by the values of the RD parameter and the K parameter. The RD parameter is interpreted as the degree of matching of the O distribution to the T distribution in relation to the relative distribution of R. It should be noted that this compliance indicates the solubility of the protein in water. On the other hand, the parameter K denotes the degree to which the nonpolar environment (including hydrophobicity in particular) is involved in the generation of a structure with a specific ordering present in the protein.
The value of K for a given set of T and O profiles can be found by searching for the distribution M for which D KL for the relation O|M is minimal. The graphic presentation of the model is shown in Figure 1. The hydrophobic environment of the membrane requires a completely different arrangement of the hydrophobicity in a protein in order to be able to permanently interact with the surrounding membrane. Here, the protein should exhibit hydrophobicity on the surface, and, in the case of a channel, polarity in the center (or, due to the free space of the channel, low hydrophobicity). Therefore, to describe the hydrophobicity distribution in a protein domain anchored in the membrane, a function was proposed in the FOD-M model that complements the 3DGauss function. For calculation purposes, the "inverse" distribution is defined as follows: where TMAX is the maximum value in the T distribution This distribution Mi should be normalized: where the index n denotes the normalization of the distribution. In fact, the distribution of the membrane-anchored domain is not a simple Mi distribution, but a combination of the T distribution and the M distribution.
Finally, taking into account the possibility of a variable proportion between the distribution T and M, the outer field is defined as follows: where K is the parameter defining the contribution of the factor expressing the "modification" of the field based on the micelle-like distribution.
All the examples of proteins described later in the results are expressed by the values of the RD parameter and the K parameter. The RD parameter is interpreted as the degree of matching of the O distribution to the T distribution in relation to the relative distribution of R. It should be noted that this compliance indicates the solubility of the protein in water. On the other hand, the parameter K denotes the degree to which the nonpolar environment (including hydrophobicity in particular) is involved in the generation of a structure with a specific ordering present in the protein.
The value of K for a given set of T and O profiles can be found by searching for the distribution M for which DKL for the relation O|M is minimal. The graphic presentation of the model is shown in Figure 1.  It is apparent in light of the research conducted so far that the value of K = 0, meaning the exclusive influence of the aquatic environment, occurs in only a few proteins (including downhill and fast-folding proteins [20]). Most soluble proteins have a K value in the range of 0.2-0.5. In contrast, K values close to 1 are found in membrane proteins that do not act as channels. As is shown in the analysis below, membrane proteins, including those serving as channels, show values of K > 1. This means that the dominant factor in shaping the protein structure is an environment different from the aqueous one, particularly a membrane one.

Tools Used
There are two possible routes of access to the program: The program allowing calculation of RD is accessible upon request on the CodeOcean platform: https://codeocean.com/capsule/3084411/tree (accessed on 15 December 2021). Please contact the corresponding author to get access to your private program instance.
In order to ensure reproducibility of results and provide easy access to the computations discussed in this paper, the authors have also implemented an online tool with which FOD computations can be performed on arbitrary protein structures, including the structures discussed in this paper. The application, implemented in collaboration with the Sano Centre for Computational Medicine (https://sano.science (accessed on 15 December 2021)) and running on resources contributed by ACC Cyfronet AGH (https://www.cyfronet.pl (accessed on 15 December 2021)) in the framework of the PL-Grid Infrastructure (https://plgrid.pl (accessed on 15 December 2021)), provides a web wrapper for the abovementioned computational component and is freely available at https://hphob.sano.science (accessed on 15 December 2021).
The tool enables users to select a protein structure by entering its PDB identifier, to select specific parts of the protein (including individual chains and secondary folds, all the way down to individual residues), and finally to run the FOD computation on the selected fragments in order to obtain RD and hydrophobicity distribution data ( Figure 2).
It is apparent in light of the research conducted so far that the value of K = 0, meaning the exclusive influence of the aquatic environment, occurs in only a few proteins (including downhill and fast-folding proteins [20]). Most soluble proteins have a K value in the range of 0.2-0.5. In contrast, K values close to 1 are found in membrane proteins that do not act as channels. As is shown in the analysis below, membrane proteins, including those serving as channels, show values of K > 1. This means that the dominant factor in shaping the protein structure is an environment different from the aqueous one, particularly a membrane one.

Tools Used
There are two possible routes of access to the program: The program allowing calculation of RD is accessible upon request on the CodeOcean platform: https://codeocean.com/capsule/3084411/tree (accessed on 15 December 2021). Please contact the corresponding author to get access to your private program instance.
In order to ensure reproducibility of results and provide easy access to the computations discussed in this paper, the authors have also implemented an online tool with which FOD computations can be performed on arbitrary protein structures, including the structures discussed in this paper. The application, implemented in collaboration with the Sano Centre for Computational Medicine (https://sano.science (accessed on 15 December 2021)) and running on resources contributed by ACC Cyfronet AGH (https://www.cyfronet.pl (accessed on 15 December 2021)) in the framework of the PL-Grid Infrastructure (https://plgrid.pl (accessed on 15 December 2021)), provides a web wrapper for the abovementioned computational component and is freely available at https://hphob.sano.science (accessed on 15 December 2021).
The tool enables users to select a protein structure by entering its PDB identifier, to select specific parts of the protein (including individual chains and secondary folds, all the way down to individual residues), and finally to run the FOD computation on the selected fragments in order to obtain RD and hydrophobicity distribution data ( Figure 2).

Analysis of Exemplary MscS Proteins
The characteristics of the HpMscS and EcMscS proteins are presented in Table 2, wherein the values of the RD and K parameters are given. As the characteristics of the two analyzed MscS proteins were similar, only HpMscS (4HW9) is discussed in detail in the following sections. In order to facilitate the identification of its individual components, appropriate nomenclature is proposed in Figure 3, where the system to identify domains is presented.

Analysis of Exemplary MscS Proteins
The characteristics of the HpMscS and EcMscS proteins are presented in Table 2, wherein the values of the RD and K parameters are given.
As the characteristics of the two analyzed MscS proteins were similar, only HpMscS (4HW9) is discussed in detail in the following sections. In order to facilitate the identification of its individual components, appropriate nomenclature is proposed in Figure 3, where the system to identify domains is presented.   The designation DD# identifies a set of corresponding domains as part of a complex, while the designation D# expresses the status of a domain treated as an individual structural unit. In the case of DD#, the Gaussian function was defined for a set of seven relevant domains. In the case of D#, the Gaussian function was defined for an individual structural unit. The others indicate the status of the fragment as a component mentioned in the header.
The values of the RD and K parameters given in the section entitled "COMPLEX" express the statuses of the complexes formed by the set of all chains and the seven domains, respectively. A 3D Gaussian function (DD# designation) was generated for the whole complex and for the complexes produced by a given set of domains.
The status of the domain set and the chain as part of the entire complex is also given (in the part of the table entitled "Fragments in complex"). The given values define the contributions and roles of individual fragments as components of the complex structure. A separate Gaussian function was not generated to determine this status. This status was determined after comparative analysis of the fragments of the T and O distributions for the selected part of the complex.
Similarly, in the section entitled "Chain individual", the status of a single chain (3D Gauss generated for the chain) is given, while the values given in this part of the table define the statuses of individual domains as members of a single chain.
The section "Domains individual" gives the statuses of domains treated as individual structural units. A 3D Gaussian function was generated for each of them. Domains treated in this way are referred to as D#.
Analysis of the T, O, and M profiles for the complex (represented by the A chain to avoid duplication of the same profile seven times) showed elevated levels in the N-terminal segment DD1 domain ( Figure 4A). The presence of a channel in the form of a deficit level of hydrophobicity in the locations of expected maximum hydrophobicity concentration was clearly marked. The C-terminal segment showed a relative alignment of the levels of T and O. This is the region of the domains beyond the membrane. A high value of K > 1 implies the need for a significant modification of the target distribution, a distribution that expresses the characteristics of an external field that differs from the idealized field as defined for globular proteins. A significant excess of hydrophobicity in the N-terminal section (DD1 domain region) suggested the participation of a hydrophobic environment in the generation of the structural form. The central part showed a definite hydrophobicity deficit, indicating the presence of large-sized canal chambers. The C-terminal segment (about 1/3 of the chain length) showed a relative agreement of the T and O distributions.
The set of profiles calculated for the domain defined as DD1 (Figure 4(B 1 )) (set of seven chain fragments) showed excess hydrophobicity on the N-terminal fragment itself, but also precisely determined the presence of a channel in the segment with a significantly underestimated hydrophobicity (C-terminal fragment of this domain). In relation to the status of the complex, a decrease in the value of K was observed.
The characteristics of the DD2 domain ( Figure 4(B 2 )) showed the presence of the channel, although it was definitely more clearly visible on the profiles shown in Figure 4A. The DD3 domain seemed to represent the status with the distribution O closest to the expected T distribution (Figure 4(B 3 )), although the presence of the channel in the form of a local hydrophobicity deficit was visible.
The DD2 and DD3 domains showed reduced K values which were relatively high compared to the statuses of the soluble proteins. This was mainly due to the presence of a channel expressed as a significant local deficit in hydrophobicity.
The DD4 domain deserves special attention ( Figure 5). It consisted of a set of seven short segments with a beta structure forming a typical beta-barrel ( Figure 3D). From the point of view of the FOD model, it represents a distribution typical of water-soluble globular proteins, which is indicated by the high compatibility of the T and O distributions expressed by the low value of RD and K = 0. The set of profiles calculated for the domain defined as DD1 (Figure 4(B1)) (set of seven chain fragments) showed excess hydrophobicity on the N-terminal fragment itself, but also precisely determined the presence of a channel in the segment with a significantly underestimated hydrophobicity (C-terminal fragment of this domain). In relation to the status of the complex, a decrease in the value of K was observed.
The characteristics of the DD2 domain (Figure 4(B2)) showed the presence of the channel, although it was definitely more clearly visible on the profiles shown in Figure  4A. The DD3 domain seemed to represent the status with the distribution O closest to the expected T distribution (Figure 4(B3)), although the presence of the channel in the form of a local hydrophobicity deficit was visible.
The DD2 and DD3 domains showed reduced K values which were relatively high compared to the statuses of the soluble proteins. This was mainly due to the presence of a channel expressed as a significant local deficit in hydrophobicity.
The DD4 domain deserves special attention ( Figure 5). It consisted of a set of seven short segments with a beta structure forming a typical beta-barrel ( Figure 3D). From the point of view of the FOD model, it represents a distribution typical of water-soluble globular proteins, which is indicated by the high compatibility of the T and O distributions expressed by the low value of RD and K = 0.   The set of profiles calculated for the domain defined as DD1 (Figure 4(B1)) (set of seven chain fragments) showed excess hydrophobicity on the N-terminal fragment itself, but also precisely determined the presence of a channel in the segment with a significantly underestimated hydrophobicity (C-terminal fragment of this domain). In relation to the status of the complex, a decrease in the value of K was observed.
The characteristics of the DD2 domain (Figure 4(B2)) showed the presence of the channel, although it was definitely more clearly visible on the profiles shown in Figure  4A. The DD3 domain seemed to represent the status with the distribution O closest to the expected T distribution (Figure 4(B3)), although the presence of the channel in the form of a local hydrophobicity deficit was visible.
The DD2 and DD3 domains showed reduced K values which were relatively high compared to the statuses of the soluble proteins. This was mainly due to the presence of a channel expressed as a significant local deficit in hydrophobicity.
The DD4 domain deserves special attention ( Figure 5). It consisted of a set of seven short segments with a beta structure forming a typical beta-barrel ( Figure 3D). From the point of view of the FOD model, it represents a distribution typical of water-soluble globular proteins, which is indicated by the high compatibility of the T and O distributions expressed by the low value of RD and K = 0.  In summarizing the characteristics of the complex and components in the form of complexes composed of domains, it should be noted that from the point of view of the complex, the characteristics of the set of T and O profiles showed a specific system in which the identification of the membrane domain was clear and the presence of the channel was also unambiguous.
The status of the domain set (DD) seemed to be more mutually ordered, where, for example, the excess of hydrophobicity observed in the T, O, and M profiles ( Figure 4A) was consumed in the case of the DD1 domain complex on interchain interactions, representing the excess hydrophobicity in the N-terminal and C-terminal sections.
Characteristics of the structure of a single chain indicated a folding significantly deviating from the globular system with a clear exposure of hydrophobicity in the N-terminal segment and a substantial deficit in the middle segment, with a relatively matched O distribution in the C-terminal segment ( Figure 6A). Such a distribution with a very high value of K = 2.1 means that this structure could not be achieved in an aquatic environment. The high value of K suggests a significant share of environmental factors with changed characteristics in relation to the aquatic environment. The analysis of the statuses of individual domains treated as individual structura units suggested the course of the chain-folding process. The domains D2 and D3 (Figur 6(B2,B3)) indicated that these domains generate a micelle-like system with a relativel high consistency of the T and O distributions at low K values. This means that these do mains can form spontaneously in the aquatic environment by striving to create a loca hydrophobic nucleus with a polar surface (micelle-like system) ( Figure 6(B2,B3)).
The statuses of individual chains and the domains present in them, treated as com ponents of the entire structure, clearly differentiated the characteristics of subsequent do mains. The status of the chain as a component of the complex was comparable to that o the entire complex. Among the domains, the status of the membrane domain was clearl different. Here, both the values of RD and K were clearly high, while in the other domain the values were much lower ( Figure 6(B1)). This means that for the individual sections o the chain that make up the domains, there was a much better match to the T distribution characteristic of the aquatic environment. Attention was drawn to the C-terminal frag ment, with its very low values of RD and K. The status of this segment constructed b seven short C-terminal fragments represents the status expressed by K = 0.0.
The analysis of the statuses of individual domains treated as individual structura units very clearly differentiated the N-terminal domain, i.e., the membrane domain. Th remaining domains showed a status characteristic of the aquatic environment, showin the presence of a hydrophobic nucleus and the exposure of polar residues on the surface Some of these were engaged to interact with analogous fragments of adjacent chains. I appears that a local excess of hydrophobicity in the individual D2 and D3 domains, e.g positions 149-151, 215-222, and 234-236, is used for the purpose of complexing the adja cent chain, thus starting to form a larger complex. These sections in the structure of th complex fit into a consistent order (sections 149-151 and 234-236), while the local exces (section 215-222) noted in the structure of a single D3 domain in the structure of the DD domain showed a hydrophobicity deficit, probably constituting a channel wall within th DD3 domain (Figures 4(B3) and 6(B3)).
The dissimilarity of the membrane domain in the form of both the DD1 and D1 com plexes results from a marked excess of hydrophobicity over the entire section of this do main. This was present both in the form of a complex and in the single domain. Very high K values indicated a significant share of the environment and environment modifyin factor for this domain. This suggests the need for the direct presence of a membrane t The part of Table 2 referred to as "Fragments in complex" gives the characteristics of the fragments mentioned, which constituted components of the entire structure of the complex. In other words, the parameters presented herein determine the local roles played by the given fragments. The values of RD and K were obtained by normalizing selected fragments from the profiles obtained for the complex. The statuses of these fragments appeared to be comparable to that of the set of domains (denoted as DD) with the exception of DD4, which occupied a superficial localization and, as shown by the profile set ( Figure 4A), low hydrophobicity was expected.
The next analysis was the status of a single chain (first line of the "Individual chain" part of Table 2). The values of the RD and K parameters indicate that the chain structure was far from the statuses represented by globular proteins. The stretched form with only locally marked higher packings was in no way close to the micelle-like form that is expected for a chain folding in an aqueous environment. The high incompatibility of the O distribution with the T distribution was due to a significant excess of hydrophobicity in the N-terminal part. There was a clear deficit of the expected high concentration of hydrophobicity in the central part of the chain. A relatively similar distribution of T to the distribution of O was observed in the C-terminal part ( Figure 6A). Local high levels of excess hydrophobicity are likely partly used for interchain interaction in both the N-and C-terminal fragments. The central, section showing a significant deficit in hydrophobicity, probably consumed the hydrophobic residues present there in part for interchain interaction.
The value of K = 2.1, which determined the status of the discussed chain, suggests complete independence from the aquatic environment. The M distribution, which almost took the form of the R distribution, also drew attention. This situation is discussed later in this work.
The analysis of the statuses of individual domains treated as individual structural units suggested the course of the chain-folding process. The domains D2 and D3 ( Figure 6(B 2 ,B 3 )) indicated that these domains generate a micelle-like system with a relatively high consistency of the T and O distributions at low K values. This means that these domains can form spontaneously in the aquatic environment by striving to create a local hydrophobic nucleus with a polar surface (micelle-like system) ( Figure 6(B 2 ,B 3 )).
The statuses of individual chains and the domains present in them, treated as components of the entire structure, clearly differentiated the characteristics of subsequent domains. The status of the chain as a component of the complex was comparable to that of the entire complex. Among the domains, the status of the membrane domain was clearly different. Here, both the values of RD and K were clearly high, while in the other domains the values were much lower ( Figure 6(B 1 )). This means that for the individual sections of the chain that make up the domains, there was a much better match to the T distribution characteristic of the aquatic environment. Attention was drawn to the C-terminal fragment, with its very low values of RD and K. The status of this segment constructed by seven short C-terminal fragments represents the status expressed by K = 0.0.
The analysis of the statuses of individual domains treated as individual structural units very clearly differentiated the N-terminal domain, i.e., the membrane domain. The remaining domains showed a status characteristic of the aquatic environment, showing the presence of a hydrophobic nucleus and the exposure of polar residues on the surface. Some of these were engaged to interact with analogous fragments of adjacent chains. It appears that a local excess of hydrophobicity in the individual D2 and D3 domains, e.g., positions 149-151, 215-222, and 234-236, is used for the purpose of complexing the adjacent chain, thus starting to form a larger complex. These sections in the structure of the complex fit into a consistent order (sections 149-151 and 234-236), while the local excess (section 215-222) noted in the structure of a single D3 domain in the structure of the DD3 domain showed a hydrophobicity deficit, probably constituting a channel wall within the DD3 domain (Figures 4(B 3 ) and 6(B 3 )).
The dissimilarity of the membrane domain in the form of both the DD1 and D1 complexes results from a marked excess of hydrophobicity over the entire section of this domain. This was present both in the form of a complex and in the single domain. Very high K values indicated a significant share of the environment and environment modifying factor for this domain. This suggests the need for the direct presence of a membrane to direct the shaping process towards the expected direction for the membrane-anchored domain. This statement is self-evident. However, it expresses the correctness of the model used (Equation (4)). Table 2 lists two membrane proteins with similar biological functions. The authors defines the status of the HpMscS (4HW9) protein as closed and EcMscS (4HWA) as open. Table 2 shows the locations of the differences between the statuses of these two forms, especially those observed in the structure of a single chain. However, this observation cannot be interpreted in the context of biological function due to the low degree of sequence identity (33%) [14]. Nevertheless, the structural analysis justifies the use of the FOD and FOD-M models to describe the structures from the MscS group.

Representative of Proteins from the MsbA Group
The structure of the transmembrane protein discussed here, called translocase (Lipid A export ATP-binding/permease protein MsbA), is the result of research on the adjustment of experimental conditions to enable soluble forms of membrane proteins to be obtained. The discussed structures were obtained in the environment of the β-dodecylmaltoside detergent, mimicking the membrane environment [15].
The analysis expressed by the parameters of the FOD and FOD-M models creates comparative possibilities for difficult-to-obtain experimental materials for research on membrane proteins.
The structures of the discussed proteins are homodimers, consisting only of the membrane domains described by the parameter set (Table 3). Very high values of both RD and K resulted from the fact that the structure was completely subordinated to the conditions of the membrane. Both the complexes and single chains and domains present in them showed significant divergences from the distributions expected for the structures characteristic of soluble proteins (Table 3).
From the point of view of the analysis based on the FOD and FOD-M models, both proposed structures showed a high degree of similarity to each other, differing slightly in terms of the parameter values themselves. However, these differences did not cause any discrepancy in interpretation.
For the analyses based on FOD and FOD-M, the example model presented is a very interesting example for the interpretation of the K parameter value. As mentioned before, a K > 1 value is expected for membrane proteins. Values of K > 3 suggest a very high share of the membrane-like factor in shaping the structures of these proteins (Figure 7). In the set of T, O, and M profiles, attention was drawn to the sections where, inste of maxima in the T distribution, minima appeared in the M distribution. Mathematica this was the result of very high K values (K > 3). In the case of this membrane prot (dimer), in the center, where the hydrophobicity maximum was expected according to FOD model, a local minimum appeared. This means that the model of the "inver Gaussian function applied here. All sections on the M profile ( Figure 7A) are dis guished in the figures shown: ice blue for D1 and red for D2. The R distribution is a given in Figure 7A. The R distribution represents the state where the effect of the 3 function balances the effect of the TMAX-Ti (1-3DG) function. This state was obtained K = 2.8. The optimal value of K was much larger for the system. This demonstrates advantage of the "inverse" Gaussian function. Interpretation of this observation s gested that polar residues are present in the central part. There are also hydrophobic r idues which, due to the proximity of the free space of the channel, effectively exhibite much lower level of hydrophobicity. As a result, an area (chain segment) was obtain The discussed example (in particular 6UZL) introduced a new observation resulting from the value of K > 3. It should be noted that the R distribution (without any differentiation) for this protein was obtained for K = 2.8. This means that the distribution for this K value represents a system with a uniform hydrophobicity distribution ( Figure 7A).
In the set of T, O, and M profiles, attention was drawn to the sections where, instead of maxima in the T distribution, minima appeared in the M distribution. Mathematically, this was the result of very high K values (K > 3). In the case of this membrane protein (dimer), in the center, where the hydrophobicity maximum was expected according to the FOD model, a local minimum appeared. This means that the model of the "inverse" Gaussian function applied here. All sections on the M profile ( Figure 7A) are distinguished in the figures shown: ice blue for D1 and red for D2. The R distribution is also given in Figure 7A. The R distribution represents the state where the effect of the 3DG function balances the effect of the TMAX-Ti (1-3DG) function. This state was obtained for K = 2.8. The optimal value of K was much larger for the system. This demonstrates the advantage of the "inverse" Gaussian function. Interpretation of this observation suggested that polar residues are present in the central part. There are also hydrophobic residues which, due to the proximity of the free space of the channel, effectively exhibited a much lower level of hydrophobicity. As a result, an area (chain segment) was obtained with a clear hydrophobicity deficit, indicating the location of the channel. Comparative analysis with an idealized distribution consistent with 3DG, as observed in the case of globular soluble proteins, in the discussed protein (dimer) gives an accurate picture of the situation of the membrane protein (hydrophobicity exposure) in the presence of a channel in the central part (polar residues) (Figure 7(B 1 ,B 2 )).
The assessment of the distributions in individual domains did not show this effect to such a strong extent, suggesting that the channel clearly appears only as a result of joining two chains (Figure 8). The characteristics of the T, O, and M distributions for a single chain revealed a significant excess of hydrophobicity along the entire length of the chain, thus expressing a structural system that was far from globular. There were clearly deficiencies in hydrophobicity in the area of the ultimate location of the channel. The value of K = 2.5 was characteristic for the distribution of T, O, and M for a single chain, which in this case meant that the distribution M (optimal as a target for folding this chain) coincided with the distribution R ( Figure 9A). The distribution M took the form of a straight line. This represents folding in an environment treated as a kind of "vacuum", i.e., no external factors had any influence on the formation of this chain, generating neither a hydrophobic nucleus nor its inverse. Profiles (Figure 9(B1,B2)) revealed different degrees of accordance between the T and O distribution measured by K, which were higher for the membrane domain (D1) and lower for the external domain (Figure 9(B2)). The characteristics of the T, O, and M distributions for a single chain revealed a significant excess of hydrophobicity along the entire length of the chain, thus expressing a structural system that was far from globular. There were clearly deficiencies in hydrophobicity in the area of the ultimate location of the channel. The value of K = 2.5 was characteristic for the distribution of T, O, and M for a single chain, which in this case meant that the distribution M (optimal as a target for folding this chain) coincided with the distribution R ( Figure 9A). The distribution M took the form of a straight line. This represents folding in an environment treated as a kind of "vacuum", i.e., no external factors had any influence on the formation of this chain, generating neither a hydrophobic nucleus nor its inverse. Profiles (Figure 9(B 1 ,B 2 )) revealed different degrees of accordance between the T and O distribution measured by K, which were higher for the membrane domain (D1) and lower for the external domain ( Figure 9(B 2 )).
the distribution M (optimal as a target for folding this chain) coincided with the distribution R ( Figure 9A). The distribution M took the form of a straight line. This represents folding in an environment treated as a kind of "vacuum", i.e., no external factors had any influence on the formation of this chain, generating neither a hydrophobic nucleus nor its inverse. Profiles (Figure 9(B1,B2)) revealed different degrees of accordance between the T and O distribution measured by K, which were higher for the membrane domain (D1) and lower for the external domain (Figure 9(B2)).  Due to the K > 3 value for the complex and the R distribution obtained for appropriate modification of the T distribution, the protein discussed here is a valuable subject for FOD-based analysis and FOD-M modeling.

Protein with an O Distribution Consistent with the T Distribution
The last example discussed here is a DNA-binding protein called rap1, which is a domain of the BRC (Breast Cancer) protein (PDB ID 2L42 [16]). This protein was included in the present analysis as an example of a structure representing a highly compatible O distribution versus a T distribution with a low K value, and thus as an example different from those previously discussed. This was to allow (at least narrowly) comparative analysis with an example of a soluble protein with a hydrophobic nucleus and a polar surface.
The second reason this protein is interesting is that it is identified as having a disordered protein status along its entire chain length (97 aa). The status of this singlechain protein in terms of the presence of the disordered form is discussed in the DisProt database [21,22].
In contrast, the FOD-based analysis model evaluated this protein as highly ordered from the point of view of the structure of the hydrophobic nucleus.
The parameters RD = 0.387 for the T-O-R relationship and a very low value of K = 0.2 suggested a high order of hydrophobicity in line with the micelle-like form, i.e., a typical arrangement characteristic of proteins that are soluble and fold under the influence and active participation of the aquatic environment.
The classification in the disordered protein category was due to a very low secondary structure content (only 29% of the chain length forms a secondary structure with a chain length of 97 aa). The absence of disulfide bonds deprives the protein of the stabilization resulting from the presence of this type of covalent bond. The dominant source of stabilization is therefore the presence of a hydrophobic nucleus. Changes in the external environment, perhaps even small ones, could be a destabilizing factor for this protein. This probably explains the observed structural instability of this protein and its presence in the DisProt base.
In Figure 10A, the criterion of Ti and Oi above 0.01 was used as the compliance criterion for a high level of hydrophobicity (framex in Figure 10A). Thus, the composition of the hydrophobic nucleus shown in Figure 10B as a red form (space-filling) was identified. The navy blue fragments in this figure represent surface and intermediate-level residues. Thus, the micelle-like form present in the structure of the protein in question is made visible. DisProt base.
In Figure 10A, the criterion of Ti and Oi above 0.01 was used as the compliance criterion for a high level of hydrophobicity (framex in Figure 10A). Thus, the composition of the hydrophobic nucleus shown in Figure 10B as a red form (space-filling) was identified. The navy blue fragments in this figure represent surface and intermediate-level residues. Thus, the micelle-like form present in the structure of the protein in question is made visible.

Discussion
The influence of the environment on protein folding is a critical factor in obtaining a structure with a specific biological activity. The model used here, based on the similarity of the folding process to the micellization process, results from the nature of amino acids as bipolar molecules that seek to generate a micelle-like structure. The degree to which such a form is obtained depends on the amino acid sequence, which in some cases precludes the generation of such a micelle-reproducing structure. Hence, local maladjustments which are difficult to predict appear to play significant roles in biological activity; in consequence, these maladjustments appear to be highly specific. A part of the protein body accordant with micelle-like construction must be present to ensure a protein's solubility (for watersoluble proteins). These parts appear similar in many proteins, constructing the protein surface (polar residues). Specific unique forms of maladjustment are varied in numerous proteins. Thus is the code for biological activity.
The phrase "sequence determines the structure of a protein" may be replaced with "sequence determines the form of maladjustment to a spherical micelle". Restoration of the micelle-like structure would result in the disappearance of any possibility of interaction (except for random polar and charge interactions). The type and degree of mismatch of the O distribution with the T distribution is of critical importance in determining the specificity of a given protein. This specificity also includes the susceptibility to the influence of the environment, including the participation of the nonpolar environment, in particular the hydrophobic environment of the membrane.
The application of the FOD and FOD-M models proposed here is not only aimed at assessing the status of a given molecule (complex), but also constitutes a proposal for a definition and mathematical record of the external field expressing a protein's environment. Including these models in the process of simulating the polypeptide chain folding process will allow not only for protein structures to be correctly predicted, but also the question of why proteins fold the way they do to be answered The treatment of the M distribution as a "target" or "matrix" type factor in achieving the goal of appropriate hydrophobicity ordering (including disorder) in computer simulations of the protein-folding process should be helpful in the appropriate specific orientation of this process.
It is recommended to simulate the folding process in the presence of an external field with a variable K parameter in order to adjust the hydrophobicity distribution for various external conditions. The postulated method of multiple criteria optimization [23], taking into account nonbinding interactions as one function subjected to optimization, and the second function expressing the matching of the order within the molecule according to the influence of the environment (for variable K values), seems to be a justified solution. The analysis presented here seems to support such an opinion.
The multiple criteria optimization postulated in [23] taking into account two functions, (1) nonbinding interactions and (2) active participation of the environment, seems to be the justified solution. This type of optimization leads to a solution expressing consensus between these two factors. A few simulations should be performed for different K values to take into account different external conditions.
It is also expected that the method presented here can be used by researchers of the biological activity of membrane proteins (and other proteins operating under conditions other than polar water).
The analysis presented here was made possible thanks to the evolving techniques of structure identification, which is difficult due to the nonaqueous environment excluding many experimental techniques [24][25][26][27][28]. Analyses are developed towards both the detailing [29] and generalization of protein structure [30]. Each available structure in PDB enables an analysis, such as the one presented in the present work, due to the necessary knowledge of the spatial position of each (heavy) atom [31].
Structural analyses are carried out from the point of view of the specificity of helical systems characteristic of membrane proteins [32][33][34]. The object of the analysis is also the significant content of the membrane, without which the membrane protein loses its specificity; hence the analyses focused on the participation of detergents, membranemimicking factors, and other specifics ensuring the construction of membrane proteins to ensure their biological function [35,36]. The issues related to the identification of functional elements of the construction of complex structures, as well as the study of the influence of the membrane protein environment through introduction of external factors other than the classic membrane, remain closely related to the model presented here [30,35,36]

Conclusions
The method proposed and used herein to assess the status of membrane proteins, based on the structure of the external field representing the conditions resulting from the specificity of the environment and external conditions, seems to be potentially widely applicable. Other proteins analyzed using the FOD model and its FOD-M modification justify this statement [9,13]. The proposed form of the external field can be used to describe and analyze the structure of any protein. It also proposes a form of mathematical notation of the specificity of the environment that actively influences the final form of the folding protein structure.
The aim of the paper is to reveal the possibility to interpret the results based on fuzzy oil drop model particularly of its modified version (FOD-M). Application to many different proteins and their different biological functions suggests the universal character of the presented model. As it is shown in this paper-the differentiation of the specificity of membrane and cytoplasmic domains-clearly identified by FOD-M model is a good example to prove this suggestion. Application of FOD-M model to amyloids allows differentiation of two scenarios of amyloid transformation [37,38].